Optimum Complexity FFT Algorithms for RISC Processors

نویسندگان

  • Herbert Karner
  • Martin Auer
  • Christoph W. Ueberhuber
چکیده

Modern RISC processors provide a special instruction { the fused multiplyadd (FMA) instruction a b c { to perform both a multiplication and an addition operation at the same time. In this paper newly developed radix-2, radix-4, and split-radix FFT algorithms that optimally take advantage of this powerful instruction are presented. All oating-point operations of these algorithms are executed as FMA instructions. If a processor is provided with FMA instructions, the radix-2 FFT algorithm introduced has the lowest complexity of all Cooley-Tukey radix-2 algorithms. The new radix-4 algorithm requires 15% fewer oating-point operations and 10% fewer memory accesses than conventional Cooley-Tukey radix-4 algorithms. In general, the advantages of the FFT algorithms presented in this paper are their low complexity, their high e ciency, and their striking simplicity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complex Multiplication Reduction in Fft Processors

The number of multiplications has been used as a key metrics for comparing FFT algorithms since it has a large impact on the execution time and total power consumption. In this paper, we present a 16-point FFT Butterfly PE, which reduces the multiplicative complexity by using real, constant multiplications. A 1024-point FFT processor has been implemented using 16-point and 4-point Butterfly PEs...

متن کامل

Strong I/O Lower Bounds for Binomial and FFT Computation Graphs

Processors on most of the modern computing devices have several levels of memory hierarchy. To obtain good performance on these processors it is necessary to design algorithms that minimize I/O traffic to slower memories in the hierarchy. In this paper, we propose a new technique, the boundary flow technique, for deriving lower bounds on the memory traffic complexity of problems in multi-level ...

متن کامل

Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition

3D FFT is computationally intensive and at the same time requires global or collective communication patterns. The efficient implementation of FFT on extreme scale computers is one of the grand challenges in scientific computing. On parallel computers with a distributed memory, different domain decompositions are possible to scale 3D FFT computation. In this paper, we argue that 2D domain decom...

متن کامل

New Formulation and Solution in PCB Assembly Systems with Parallel Batch processors

This paper considers the scheduling problem of parallel batch processing machines with non-identical job size and processing time. In this paper, a new mathematical model with ready time and batch size constraints is presented to formulate the problem mathematically, in which simultaneous reduction of the makespan and earliness-tardiness is the objective function. In recent years, the nature-in...

متن کامل

A Comparative Analysis of Fft Algorithms

With the rapid development of computer technology, general purpose CPUs have made inroads into many signal processing applications; of which the Fast Fourier Transform (FFT) continues to be an integral part. A large number of FFT algorithms have been developed over the years, notably the Radix-2, Radix-4, Split-Radix, Fast Hartley Transform (FHT), Quick Fourier Transform (QFT), and the Decimati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998